{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Generating, Indexing and Searching Embeddings (Experimental)\n",
    "\n",
    "**WARNING: The feature introduced in this tutorial is currently experimental. It does not have any API stability guarantee.**\n",
    "\n",
    "## Installing the Package\n",
    "\n",
    "For testing purpose, let's install the latest development version:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/gpadmin/GreenplumPython\n",
      "Defaulting to user installation because normal site-packages is not writeable\n",
      "Processing /home/gpadmin/GreenplumPython\n",
      "  Installing build dependencies ... \u001b[?25ldone\n",
      "\u001b[?25h  Getting requirements to build wheel ... \u001b[?25ldone\n",
      "\u001b[?25h    Preparing wheel metadata ... \u001b[?25ldone\n",
      "\u001b[?25hRequirement already satisfied, skipping upgrade: psycopg2-binary==2.9.5 in /home/gpadmin/.local/lib/python3.9/site-packages (from greenplum-python==1.0.1) (2.9.5)\n",
      "Requirement already satisfied, skipping upgrade: dill==0.3.6 in /home/gpadmin/.local/lib/python3.9/site-packages (from greenplum-python==1.0.1) (0.3.6)\n",
      "Building wheels for collected packages: greenplum-python\n",
      "  Building wheel for greenplum-python (PEP 517) ... \u001b[?25ldone\n",
      "\u001b[?25h  Created wheel for greenplum-python: filename=greenplum_python-1.0.1-py3-none-any.whl size=71903 sha256=305b83c461fb90310fafe09821f5778ef5235a439aee59ee3c1f304e349188d6\n",
      "  Stored in directory: /tmp/pip-ephem-wheel-cache-w_h4u4oe/wheels/bb/1f/99/ff8594e48ec11df99af6e0ee8611a5e560e9f44d1a3fefb351\n",
      "Successfully built greenplum-python\n",
      "Installing collected packages: greenplum-python\n",
      "Successfully installed greenplum-python-1.0.1\n"
     ]
    }
   ],
   "source": [
    "%cd ../../../\n",
    "!python3 -m pip install --upgrade ."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparing Data\n",
    "\n",
    "With GreenplumPython install, let's create a table with some sample text data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "content = [\"I have a dog.\", \"I like eating apples.\"]\n",
    "\n",
    "import greenplumpython as gp\n",
    "\n",
    "db = gp.database(\"postgresql://localhost:7000\")\n",
    "t = (\n",
    "    db.create_dataframe(columns={\"id\": range(len(content)), \"content\": content})\n",
    "    .save_as(\n",
    "        table_name=\"text_sample\",\n",
    "        column_names=[\"id\", \"content\"],\n",
    "        distribution_key={\"id\"},\n",
    "        distribution_type=\"hash\",\n",
    "        drop_if_exists=True,\n",
    "        drop_cascade=True,\n",
    "    )\n",
    "    .check_unique(columns={\"id\"})\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generating and Indexing Embeddings\n",
    "\n",
    "On the text sample table, we can now create an embedding index with the new `embedding` module:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "\t<tr>\n",
       "\t\t<th>id</th>\n",
       "\t\t<th>content</th>\n",
       "\t</tr>\n",
       "\t<tr>\n",
       "\t\t<td>0</td>\n",
       "\t\t<td>I have a dog.</td>\n",
       "\t</tr>\n",
       "\t<tr>\n",
       "\t\t<td>1</td>\n",
       "\t\t<td>I like eating apples.</td>\n",
       "\t</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "----------------------------\n",
       " id | content               \n",
       "----+-----------------------\n",
       "  0 | I have a dog.         \n",
       "  1 | I like eating apples. \n",
       "----------------------------\n",
       "(2 rows)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import greenplumpython.experimental.embedding\n",
    "\n",
    "t = t.embedding().create_index(column=\"content\", model_name=\"all-MiniLM-L6-v2\")\n",
    "t"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will generate embeddings for the text data using the specified model and create vector index on the embeddings for fast k-NN search."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generating Embeddings without Indexing \n",
    "\n",
    "If we just want to generate the embeddings without creating a vector index, we can use the function `create_embedding()` from the `embedding` module:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from greenplumpython.experimental.embedding import create_embedding"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since we're not indexing vectors, the dataframe doesn't need to be stored as a table in the database. And we do not need to specify the unique key if the embeddings are in the same dataframe.\n",
    "\n",
    "Furthermore, if we want to save the embeddings as vector type with embedding dimension, so that we can index them later, we need to cast the result to type `gp.type_(\"vector\", modifier=<embedding_dimension>)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "\t<tr>\n",
       "\t\t<th>id</th>\n",
       "\t\t<th>content</th>\n",
       "\t\t<th>embedding_col</th>\n",
       "\t</tr>\n",
       "\t<tr>\n",
       "\t\t<td>0</td>\n",
       "\t\t<td>I have a dog.</td>\n",
       "\t\t<td>[-0.03659846,-0.012087725,0.08805456,0.06115138,-0.043457743,-0.01559289,0.07047544,-0.002039723,0.08257614,-0.027372131,0.04414288,-0.03269939,0.013636172,0.041616578,0.01041031,-0.0015929971,-0.06705982,-0.04409838,-0.0057846354,-0.064064376,-0.065876596,0.07500749,0.012162328,-0.005788606,-0.10990597,0.027304182,-0.039163485,-0.05016219,0.0029829051,-0.03839179,-0.015229771,-0.055909265,-0.011802612,-0.004877074,-0.042732246,-0.041694522,0.0065653333,-0.013692751,0.10301593,0.080455385,0.04717231,0.014515034,0.06301975,-0.008371313,-0.0037640464,0.037010957,-0.08730185,-0.019860014,0.116165005,-0.00917515,-0.029422058,0.057609066,-0.017986145,0.030363863,-0.018659862,-0.02392018,0.0075364686,0.030293366,-0.0017754078,-0.02729239,0.010452815,0.06776974,0.009428493,0.0472378,0.00020276332,0.020744024,-0.06177006,0.06334934,-0.0663036,0.055175003,0.036360443,0.03362923,0.04471841,0.09541594,-0.03548978,-0.107487485,0.06386296,-0.030471282,0.1804235,0.07920901,-0.08705953,-0.06174665,-0.042777628,0.04772076,0.0404552,0.011489488,0.07283992,0.06658615,-0.117522426,0.011569878,-0.02257866,-0.049202636,-0.03411388,0.017634219,-0.0032649843,-0.01003333,-0.022944538,-0.03394829,-0.021662673,0.08960133,0.0084434915,0.028806066,0.071886115,0.045687664,0.09596071,0.02309955,-0.093928486,0.060846634,-0.010293208,0.0019619183,-0.01262796,0.00903265,-0.023953682,0.10200915,0.047290858,0.045499157,-0.075414516,-0.024221132,0.060803182,-0.09191944,0.011989032,0.021896891,-0.04434021,0.021226257,0.019848466,-0.05852586,0.03497773,-9.0260415e-33,-0.005345117,-0.02905326,0.014672193,0.04659996,-0.02827288,0.013217331,-0.038185596,0.030172179,-0.052595694,-0.016775005,0.0034630296,0.0005796092,-0.020373767,-0.034381017,-0.003368548,0.0013990616,0.051134117,0.018485619,0.08034309,-0.00014359028,-0.013998864,-0.021287004,0.039143343,0.017298104,-0.01783786,-0.012515478,-0.013980072,-0.08343152,-0.026559573,0.024582457,0.028264288,0.020893438,0.045632884,-0.041543033,-0.10551889,-0.036366448,-0.053493455,-0.055436466,-0.04398083,0.052545346,0.08640961,-0.0042671273,0.017281737,-0.00034555266,0.004699934,-0.034805812,0.008263813,0.020119106,-0.09260096,0.014703147,0.011787516,-0.033072904,0.0042901696,-0.089319356,-0.029248364,-0.041016947,0.05976217,-0.00918999,0.019669672,0.08591935,0.022527535,0.0075523653,-0.030852487,0.0293062,0.051727384,-0.090517566,-0.095217556,-0.04174029,-0.0011758066,0.014292619,-0.024682239,-0.00352193,0.007736276,-0.017399697,0.071428835,-0.01235873,-0.005342253,-0.0033088347,-0.01875911,-0.07966146,0.019006373,0.0018609434,0.0070682094,0.05770639,0.07751444,0.05984159,-0.029955158,-0.0058063027,-0.023169449,0.0026583143,-0.0657157,-0.04399309,0.033948664,-0.027996266,0.040526785,5.238206e-33,0.010241496,0.03607309,0.046909038,0.01363597,-0.005335445,0.0016521378,-0.020371612,0.04564497,-0.08217566,0.06402261,-0.0017092424,0.04467231,0.10069538,0.00045676067,0.062299315,0.037693474,-0.039460365,-0.019606704,0.05026583,-0.05616923,-0.18455043,0.08040067,0.074261375,0.01932381,-0.026447851,0.040501554,-0.019648919,-0.023729222,-0.0589519,-0.08537439,-0.045682527,-0.12889874,-0.05590042,-0.068548314,-0.0058031343,0.06694754,-0.023167383,-0.14575258,-0.0123237,-0.05953811,0.036701642,-0.0021032344,0.048329204,0.078937724,0.01448631,0.029141147,0.014654006,-0.06743169,0.009763479,0.03308005,-0.026131291,-0.008976251,-0.02805068,-0.06251999,-0.003333123,-0.01415754,-0.07179516,-0.06783281,0.014238787,0.008521243,-0.03168489,0.0996435,-0.052023314,0.13799056,-0.01971767,-0.0868198,-0.007109497,-0.055724714,0.011921491,-0.073369175,-0.007965487,0.07029797,-0.031166457,-0.055607323,0.010831625,0.04010842,0.051589157,-0.0015768374,0.037868574,0.015498447,-0.06851171,-0.040853888,0.0092245145,-0.010765777,-0.0015251125,-0.037699535,-0.0050808727,0.05028556,-0.0018061057,0.047179475,-0.032873698,0.07862571,0.021928754,-0.055561442,0.0068104025,-1.601115e-08,-0.047843583,-0.0016648499,-0.0019612422,-0.002554689,0.051340975,0.03563475,0.008412877,-0.06416777,-0.031938273,-0.019677943,0.031404965,-0.017351918,-0.043358672,0.020338805,0.10461016,0.025110265,0.01756787,8.452355e-06,0.03481564,0.119492605,-0.07120706,0.014109293,0.07982084,-0.006870619,-0.0052823476,-0.029617291,0.0735672,0.06555545,-0.0973324,0.06841363,-0.03208407,0.109986424,-0.03169939,0.018973589,0.024622567,-0.06959749,0.07099971,-0.0502078,0.044230383,0.021497766,0.057419084,0.12532368,-0.08883316,-0.018113941,0.0011768519,0.06459078,-0.0014821336,-0.09094165,-0.0075864964,-0.00019048726,-0.12415704,-0.06488212,0.09381432,0.051018294,-0.020306533,-0.004231261,-0.01809832,-0.07439528,0.05670538,0.03697211,0.038794994,0.04458422,-0.080352895,-0.030577209]</td>\n",
       "\t</tr>\n",
       "\t<tr>\n",
       "\t\t<td>1</td>\n",
       "\t\t<td>I like eating apples.</td>\n",
       "\t\t<td>[0.021809125,-0.015531936,0.011607823,0.08773645,-0.060896702,-0.035311054,0.11097563,-0.05388054,0.015478587,0.025643231,0.034682132,-0.09349964,0.018253824,0.0032013033,0.04340516,-0.037074324,0.088959105,-0.0040924014,-0.010021067,0.005995189,-0.078318015,0.066143975,0.042326767,-0.027101004,0.017702203,0.047038272,0.069593005,-0.037545256,-0.08466898,-0.0149313845,-0.05919546,2.3295312e-05,0.013309427,0.012327688,-0.05439113,0.0081964955,0.1404407,-0.07974374,-0.04133351,-0.022248574,0.018386977,0.06675908,0.060005367,0.040904358,-0.057686336,-0.008572924,-0.00069316046,-0.017934205,0.09348524,0.04610809,0.042312037,0.004256497,-0.035399776,-0.031868283,0.055097736,0.030634014,0.017477207,0.007607817,0.0028514373,-0.00848901,0.07058604,-0.065969445,-0.0030018615,0.017515391,0.03681229,-0.051015034,-0.051681925,-0.007240641,-0.05672333,-0.00033159592,-0.016689943,0.050976675,0.09232235,0.04870195,-0.023326442,0.014425969,0.0944048,-0.084106356,-0.06532096,0.010295293,-0.060007896,-0.0066203177,0.018760895,0.006218678,-0.016821053,-0.05153683,-0.019194037,0.019247936,-0.05592109,0.07442912,0.0011268753,-0.01857252,-0.03386638,0.048263445,0.0018756357,0.021458386,0.02670066,-0.07195236,-0.035215963,0.09375799,0.009641698,0.03153929,-0.0065211398,0.059988208,0.029077088,0.006436115,-0.16888268,-0.012192926,0.008317657,-0.0010368789,0.020289453,-0.015101345,-0.036400657,-0.0053182426,0.016343204,0.04836311,0.052492023,0.0022888337,0.013867806,-0.011067135,-0.00632472,0.08962689,-0.056332756,-5.0709128e-05,0.00037433414,-0.043979205,0.030548107,-6.1121328e-33,-0.10044726,-0.04796987,0.050677963,-0.031848893,0.017650908,0.0055781733,0.035132997,0.095104724,0.091575645,-0.026064795,-0.0059387633,-0.023844877,-0.03789114,-0.0062694866,0.024072742,-0.06319934,-0.025684576,0.072659574,-0.04208775,-0.014134044,-0.017349942,-0.09240058,-0.0064091305,0.09291195,-0.027069137,-0.08738226,0.042585023,-0.12305703,0.062073898,0.017139783,0.043850746,-0.005554785,-0.03515969,-0.057963055,-0.0016850779,-0.029315367,0.07211059,0.049894214,-0.028748097,0.0011031141,-0.007046476,0.020515675,0.067191206,0.021492152,0.06486439,0.0060838815,0.025401684,0.0739729,-0.030965952,-0.00762098,-0.04577821,-0.048278432,0.09053183,0.03222762,-0.015725326,-0.0107247075,0.013521895,-0.0360384,-0.09246122,0.01310438,-0.07853673,0.049683314,0.008800179,-0.007872623,-0.11311235,0.11412768,-0.03581802,-0.047303308,0.014969714,0.02396507,-0.04279115,0.031482812,-0.022683978,0.0005804888,-0.11246332,-0.09786996,0.045210768,-0.03159177,-0.05506939,-0.023562698,0.052014776,-0.0024514296,0.003902688,-0.010034765,0.033652794,0.122117504,-0.06718425,-0.066750795,0.108197525,-0.015414996,0.00400915,0.021052254,0.016455496,0.019499239,-0.12814386,5.5857067e-33,-0.0018572184,-0.080794044,-0.013305333,0.01841107,-0.037682977,-0.067594446,-0.087071694,0.013579439,-0.02803438,-0.032445576,-0.026130464,-0.006865185,-0.022305872,-0.016416714,0.023153791,0.024428565,-0.011959914,0.093689434,-0.032577604,0.026465515,-0.046098784,0.008481788,-0.006716845,0.019120447,0.016167238,-0.02313292,-0.0042774454,0.043933924,-0.018111937,0.059962064,0.05109594,-0.07903502,-0.059705768,-0.13360032,0.04902079,0.035442233,-0.09378037,-0.056613892,-0.0022577408,0.030770848,0.015449528,0.0032539356,0.031303164,0.11281754,0.036288805,0.09346795,0.0313906,0.058778953,0.022154897,0.05777495,0.00097196305,-0.02609103,-0.06628839,0.015047393,0.03955508,0.0523623,0.0069718817,0.0009399279,-0.039598145,-0.07549803,-0.102647424,0.06432405,0.018766917,0.013961236,0.06031335,-0.02941947,-0.03033608,-0.053566907,-0.07672768,0.012401397,-0.009276499,-0.054574206,-0.0566019,-0.024081016,-0.039790105,-0.035410725,0.011844968,0.036265053,-0.08490442,0.058963377,-0.030408578,0.10739632,0.010045292,0.06581671,0.049952522,0.05613914,-0.018259415,0.023479586,-0.04595969,0.03890778,-0.005904789,-0.015094089,0.013457783,-0.03914847,0.011510677,-1.5212935e-08,-0.045827758,-0.029699294,0.03503024,-0.010878928,-0.003190462,0.07422462,-0.07662787,0.05413322,0.02137874,-0.040636785,0.062867135,0.08551578,-0.08906489,0.05611474,0.048328113,0.008293776,0.08469364,-0.027762407,-0.015386819,0.067916475,-0.0937729,0.018911839,-0.013140985,0.04376479,-0.018527055,0.021828363,0.0024259402,0.020919863,0.1057404,0.063920595,0.056231383,0.053664792,-0.08300249,0.068553776,-0.0059213005,-0.0768514,0.010081414,-0.011377745,-0.012504746,-0.10047471,-0.049601573,-0.002936166,0.015598577,-0.042786237,-0.0998226,0.022823302,0.063844405,0.011207117,0.020726835,0.08571722,0.041427787,0.026192738,0.09660777,0.08237022,0.036912948,-0.014799402,0.043485742,-0.07760759,0.015751759,0.07816933,0.117991626,0.058715604,0.021846006,-0.016581282]</td>\n",
       "\t</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n",
       " id | content               | embedding_col                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           \n",
       "----+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n",
       "  0 | I have a dog.         | [-0.03659846,-0.012087725,0.08805456,0.06115138,-0.043457743,-0.01559289,0.07047544,-0.002039723,0.08257614,-0.027372131,0.04414288,-0.03269939,0.013636172,0.041616578,0.01041031,-0.0015929971,-0.06705982,-0.04409838,-0.0057846354,-0.064064376,-0.065876596,0.07500749,0.012162328,-0.005788606,-0.10990597,0.027304182,-0.039163485,-0.05016219,0.0029829051,-0.03839179,-0.015229771,-0.055909265,-0.011802612,-0.004877074,-0.042732246,-0.041694522,0.0065653333,-0.013692751,0.10301593,0.080455385,0.04717231,0.014515034,0.06301975,-0.008371313,-0.0037640464,0.037010957,-0.08730185,-0.019860014,0.116165005,-0.00917515,-0.029422058,0.057609066,-0.017986145,0.030363863,-0.018659862,-0.02392018,0.0075364686,0.030293366,-0.0017754078,-0.02729239,0.010452815,0.06776974,0.009428493,0.0472378,0.00020276332,0.020744024,-0.06177006,0.06334934,-0.0663036,0.055175003,0.036360443,0.03362923,0.04471841,0.09541594,-0.03548978,-0.107487485,0.06386296,-0.030471282,0.1804235,0.07920901,-0.08705953,-0.06174665,-0.042777628,0.04772076,0.0404552,0.011489488,0.07283992,0.06658615,-0.117522426,0.011569878,-0.02257866,-0.049202636,-0.03411388,0.017634219,-0.0032649843,-0.01003333,-0.022944538,-0.03394829,-0.021662673,0.08960133,0.0084434915,0.028806066,0.071886115,0.045687664,0.09596071,0.02309955,-0.093928486,0.060846634,-0.010293208,0.0019619183,-0.01262796,0.00903265,-0.023953682,0.10200915,0.047290858,0.045499157,-0.075414516,-0.024221132,0.060803182,-0.09191944,0.011989032,0.021896891,-0.04434021,0.021226257,0.019848466,-0.05852586,0.03497773,-9.0260415e-33,-0.005345117,-0.02905326,0.014672193,0.04659996,-0.02827288,0.013217331,-0.038185596,0.030172179,-0.052595694,-0.016775005,0.0034630296,0.0005796092,-0.020373767,-0.034381017,-0.003368548,0.0013990616,0.051134117,0.018485619,0.08034309,-0.00014359028,-0.013998864,-0.021287004,0.039143343,0.017298104,-0.01783786,-0.012515478,-0.013980072,-0.08343152,-0.026559573,0.024582457,0.028264288,0.020893438,0.045632884,-0.041543033,-0.10551889,-0.036366448,-0.053493455,-0.055436466,-0.04398083,0.052545346,0.08640961,-0.0042671273,0.017281737,-0.00034555266,0.004699934,-0.034805812,0.008263813,0.020119106,-0.09260096,0.014703147,0.011787516,-0.033072904,0.0042901696,-0.089319356,-0.029248364,-0.041016947,0.05976217,-0.00918999,0.019669672,0.08591935,0.022527535,0.0075523653,-0.030852487,0.0293062,0.051727384,-0.090517566,-0.095217556,-0.04174029,-0.0011758066,0.014292619,-0.024682239,-0.00352193,0.007736276,-0.017399697,0.071428835,-0.01235873,-0.005342253,-0.0033088347,-0.01875911,-0.07966146,0.019006373,0.0018609434,0.0070682094,0.05770639,0.07751444,0.05984159,-0.029955158,-0.0058063027,-0.023169449,0.0026583143,-0.0657157,-0.04399309,0.033948664,-0.027996266,0.040526785,5.238206e-33,0.010241496,0.03607309,0.046909038,0.01363597,-0.005335445,0.0016521378,-0.020371612,0.04564497,-0.08217566,0.06402261,-0.0017092424,0.04467231,0.10069538,0.00045676067,0.062299315,0.037693474,-0.039460365,-0.019606704,0.05026583,-0.05616923,-0.18455043,0.08040067,0.074261375,0.01932381,-0.026447851,0.040501554,-0.019648919,-0.023729222,-0.0589519,-0.08537439,-0.045682527,-0.12889874,-0.05590042,-0.068548314,-0.0058031343,0.06694754,-0.023167383,-0.14575258,-0.0123237,-0.05953811,0.036701642,-0.0021032344,0.048329204,0.078937724,0.01448631,0.029141147,0.014654006,-0.06743169,0.009763479,0.03308005,-0.026131291,-0.008976251,-0.02805068,-0.06251999,-0.003333123,-0.01415754,-0.07179516,-0.06783281,0.014238787,0.008521243,-0.03168489,0.0996435,-0.052023314,0.13799056,-0.01971767,-0.0868198,-0.007109497,-0.055724714,0.011921491,-0.073369175,-0.007965487,0.07029797,-0.031166457,-0.055607323,0.010831625,0.04010842,0.051589157,-0.0015768374,0.037868574,0.015498447,-0.06851171,-0.040853888,0.0092245145,-0.010765777,-0.0015251125,-0.037699535,-0.0050808727,0.05028556,-0.0018061057,0.047179475,-0.032873698,0.07862571,0.021928754,-0.055561442,0.0068104025,-1.601115e-08,-0.047843583,-0.0016648499,-0.0019612422,-0.002554689,0.051340975,0.03563475,0.008412877,-0.06416777,-0.031938273,-0.019677943,0.031404965,-0.017351918,-0.043358672,0.020338805,0.10461016,0.025110265,0.01756787,8.452355e-06,0.03481564,0.119492605,-0.07120706,0.014109293,0.07982084,-0.006870619,-0.0052823476,-0.029617291,0.0735672,0.06555545,-0.0973324,0.06841363,-0.03208407,0.109986424,-0.03169939,0.018973589,0.024622567,-0.06959749,0.07099971,-0.0502078,0.044230383,0.021497766,0.057419084,0.12532368,-0.08883316,-0.018113941,0.0011768519,0.06459078,-0.0014821336,-0.09094165,-0.0075864964,-0.00019048726,-0.12415704,-0.06488212,0.09381432,0.051018294,-0.020306533,-0.004231261,-0.01809832,-0.07439528,0.05670538,0.03697211,0.038794994,0.04458422,-0.080352895,-0.030577209]        \n",
       "  1 | I like eating apples. | [0.021809125,-0.015531936,0.011607823,0.08773645,-0.060896702,-0.035311054,0.11097563,-0.05388054,0.015478587,0.025643231,0.034682132,-0.09349964,0.018253824,0.0032013033,0.04340516,-0.037074324,0.088959105,-0.0040924014,-0.010021067,0.005995189,-0.078318015,0.066143975,0.042326767,-0.027101004,0.017702203,0.047038272,0.069593005,-0.037545256,-0.08466898,-0.0149313845,-0.05919546,2.3295312e-05,0.013309427,0.012327688,-0.05439113,0.0081964955,0.1404407,-0.07974374,-0.04133351,-0.022248574,0.018386977,0.06675908,0.060005367,0.040904358,-0.057686336,-0.008572924,-0.00069316046,-0.017934205,0.09348524,0.04610809,0.042312037,0.004256497,-0.035399776,-0.031868283,0.055097736,0.030634014,0.017477207,0.007607817,0.0028514373,-0.00848901,0.07058604,-0.065969445,-0.0030018615,0.017515391,0.03681229,-0.051015034,-0.051681925,-0.007240641,-0.05672333,-0.00033159592,-0.016689943,0.050976675,0.09232235,0.04870195,-0.023326442,0.014425969,0.0944048,-0.084106356,-0.06532096,0.010295293,-0.060007896,-0.0066203177,0.018760895,0.006218678,-0.016821053,-0.05153683,-0.019194037,0.019247936,-0.05592109,0.07442912,0.0011268753,-0.01857252,-0.03386638,0.048263445,0.0018756357,0.021458386,0.02670066,-0.07195236,-0.035215963,0.09375799,0.009641698,0.03153929,-0.0065211398,0.059988208,0.029077088,0.006436115,-0.16888268,-0.012192926,0.008317657,-0.0010368789,0.020289453,-0.015101345,-0.036400657,-0.0053182426,0.016343204,0.04836311,0.052492023,0.0022888337,0.013867806,-0.011067135,-0.00632472,0.08962689,-0.056332756,-5.0709128e-05,0.00037433414,-0.043979205,0.030548107,-6.1121328e-33,-0.10044726,-0.04796987,0.050677963,-0.031848893,0.017650908,0.0055781733,0.035132997,0.095104724,0.091575645,-0.026064795,-0.0059387633,-0.023844877,-0.03789114,-0.0062694866,0.024072742,-0.06319934,-0.025684576,0.072659574,-0.04208775,-0.014134044,-0.017349942,-0.09240058,-0.0064091305,0.09291195,-0.027069137,-0.08738226,0.042585023,-0.12305703,0.062073898,0.017139783,0.043850746,-0.005554785,-0.03515969,-0.057963055,-0.0016850779,-0.029315367,0.07211059,0.049894214,-0.028748097,0.0011031141,-0.007046476,0.020515675,0.067191206,0.021492152,0.06486439,0.0060838815,0.025401684,0.0739729,-0.030965952,-0.00762098,-0.04577821,-0.048278432,0.09053183,0.03222762,-0.015725326,-0.0107247075,0.013521895,-0.0360384,-0.09246122,0.01310438,-0.07853673,0.049683314,0.008800179,-0.007872623,-0.11311235,0.11412768,-0.03581802,-0.047303308,0.014969714,0.02396507,-0.04279115,0.031482812,-0.022683978,0.0005804888,-0.11246332,-0.09786996,0.045210768,-0.03159177,-0.05506939,-0.023562698,0.052014776,-0.0024514296,0.003902688,-0.010034765,0.033652794,0.122117504,-0.06718425,-0.066750795,0.108197525,-0.015414996,0.00400915,0.021052254,0.016455496,0.019499239,-0.12814386,5.5857067e-33,-0.0018572184,-0.080794044,-0.013305333,0.01841107,-0.037682977,-0.067594446,-0.087071694,0.013579439,-0.02803438,-0.032445576,-0.026130464,-0.006865185,-0.022305872,-0.016416714,0.023153791,0.024428565,-0.011959914,0.093689434,-0.032577604,0.026465515,-0.046098784,0.008481788,-0.006716845,0.019120447,0.016167238,-0.02313292,-0.0042774454,0.043933924,-0.018111937,0.059962064,0.05109594,-0.07903502,-0.059705768,-0.13360032,0.04902079,0.035442233,-0.09378037,-0.056613892,-0.0022577408,0.030770848,0.015449528,0.0032539356,0.031303164,0.11281754,0.036288805,0.09346795,0.0313906,0.058778953,0.022154897,0.05777495,0.00097196305,-0.02609103,-0.06628839,0.015047393,0.03955508,0.0523623,0.0069718817,0.0009399279,-0.039598145,-0.07549803,-0.102647424,0.06432405,0.018766917,0.013961236,0.06031335,-0.02941947,-0.03033608,-0.053566907,-0.07672768,0.012401397,-0.009276499,-0.054574206,-0.0566019,-0.024081016,-0.039790105,-0.035410725,0.011844968,0.036265053,-0.08490442,0.058963377,-0.030408578,0.10739632,0.010045292,0.06581671,0.049952522,0.05613914,-0.018259415,0.023479586,-0.04595969,0.03890778,-0.005904789,-0.015094089,0.013457783,-0.03914847,0.011510677,-1.5212935e-08,-0.045827758,-0.029699294,0.03503024,-0.010878928,-0.003190462,0.07422462,-0.07662787,0.05413322,0.02137874,-0.040636785,0.062867135,0.08551578,-0.08906489,0.05611474,0.048328113,0.008293776,0.08469364,-0.027762407,-0.015386819,0.067916475,-0.0937729,0.018911839,-0.013140985,0.04376479,-0.018527055,0.021828363,0.0024259402,0.020919863,0.1057404,0.063920595,0.056231383,0.053664792,-0.08300249,0.068553776,-0.0059213005,-0.0768514,0.010081414,-0.011377745,-0.012504746,-0.10047471,-0.049601573,-0.002936166,0.015598577,-0.042786237,-0.0998226,0.022823302,0.063844405,0.011207117,0.020726835,0.08571722,0.041427787,0.026192738,0.09660777,0.08237022,0.036912948,-0.014799402,0.043485742,-0.07760759,0.015751759,0.07816933,0.117991626,0.058715604,0.021846006,-0.016581282] \n",
       "--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n",
       "(2 rows)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.create_dataframe(columns={\"id\": range(len(content)), \"content\": content}).assign(\n",
    "    embedding_col=lambda t: (\n",
    "        gp.type_(\"vector\", modifier=384)(create_embedding(t[\"content\"], \"all-MiniLM-L6-v2\"))\n",
    "    ),\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Semantic Search by Embeddings\n",
    "\n",
    "With the embedding index, we can search for contents based on the semantic similairy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table>\n",
       "\t<tr>\n",
       "\t\t<th>id</th>\n",
       "\t\t<th>content</th>\n",
       "\t</tr>\n",
       "\t<tr>\n",
       "\t\t<td>1</td>\n",
       "\t\t<td>I like eating apples.</td>\n",
       "\t</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "----------------------------\n",
       " id | content               \n",
       "----+-----------------------\n",
       "  1 | I like eating apples. \n",
       "----------------------------\n",
       "(1 row)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "t.embedding().search(column=\"content\", query=\"apple\", top_k=1)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is going to be very efficient since we don't need to scan all the data."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleaning All at Once\n",
    "\n",
    "To ease management, the dependencies of the embedding index and the base table will be recorded in database.\n",
    "\n",
    "As a result, trying to droping the base table alone will fail:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "vscode": {
     "languageId": "sql"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * postgresql://localhost:7000\n",
      "(psycopg2.errors.DependentObjectsStillExist) cannot drop table text_sample because other objects depend on it\n",
      "DETAIL:  table cte_32a769763ae94cd9b4036ceb590c4f0d depends on table text_sample\n",
      "HINT:  Use DROP ... CASCADE to drop the dependent objects too.\n",
      "\n",
      "[SQL: DROP TABLE text_sample]\n",
      "(Background on this error at: https://sqlalche.me/e/20/2j85)\n"
     ]
    }
   ],
   "source": [
    "%reload_ext sql\n",
    "%sql postgresql://localhost:7000\n",
    "%sql DROP TABLE text_sample"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To drop the base table, we need to also drop the embedding index. This can be achieved with `CASCADE`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "vscode": {
     "languageId": "sql"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * postgresql://localhost:7000\n",
      "Done.\n",
      "0 rows affected.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>oid</th>\n",
       "            <th>relname</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%sql\n",
    "DROP TABLE text_sample CASCADE;\n",
    "\n",
    "SELECT oid, relname\n",
    "FROM gp_dist_random('pg_class')\n",
    "WHERE relname = 'cte_32a769763ae94cd9b4036ceb590c4f0d';"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see, after `DROP CASCADE`, the embedding index also gets dropped on all segments."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
